[BugFix] Ensure num_cached_tokens is non-negative for kv transfer failed requests by KingsleyZhang123 · Pull Request #37354 · vllm-project/vllm

KingsleyZhang123 · 2026-03-17T23:10:09Z

Purpose

BugFix for failure kv load requests handling
For requests failing KV load in decode side, since it's still in WAITING_REMOTE_KV state, its num_cached_tokens are still the default -1, and it was never updated, when we do metrics logging on local_cache_hit, -1 will be used and will crash engine due to:

ValueError: Counters can only be incremented by non-negative amounts.

Test Plan

Tested with stress testing and intentionally fail random kv transfer to surface the bug.

Test Result

Before:

logger will crash with ValueError: Counters can only be incremented by non-negative amounts.

After:

No crash observed

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…led requests For requests failing KV load in decode side, since it's still in WAITING_REMOTE_KV state, its num_cached_tokens are still the default -1, and it was never updated, when we do metrics logging on local_cache_hit, -1 will be used and will crash engine due to: ValueError: Counters can only be incremented by non-negative amounts.

gemini-code-assist

Code Review

This pull request addresses a ValueError that occurs when logging metrics for requests that fail KV transfer. The num_cached_tokens for such requests can remain at its default value of -1, which causes a crash when used to increment a counter. The fix applies max(0, ...) to ensure num_cached_tokens is always non-negative when creating the EngineCoreOutput for finished requests. This is a direct and effective solution to the problem. The change is correct and I have no further suggestions.

markmc

Hold because we're working through this in #36859

markmc · 2026-03-18T07:51:35Z

If you have a clear instructions for reproducing this issue, that would be helpful. Thanks

KingsleyZhang123 · 2026-03-18T14:37:47Z

Hold because we're working through this in #36859

OK I think this commit works as well,
to reproduce, use P/D disagg kv connector with async_kv_load set to True, and manually failed the kv transfer, so that the request is marked as failure when it's still in WAITING_REMOTE_KV state, where its num_cached_tokens is still -1.

chenminghua8 · 2026-04-02T09:17:29Z

If you have a clear instructions for reproducing this issue, that would be helpful. Thanks

Use a100 80g to start vllm：
LMCACHE_LOG_LEVEL=ERROR
LMCACHE_LOCAL_CPU=True
LMCACHE_MAX_LOCAL_CPU_SIZE=35
LMCACHE_CHUNK_SIZE=256
LMCACHE_USE_EXPERIMENTAL=True
PYTHONHASHSEED=0
CUDA_VISIBLE_DEVICES=1
vllm serve /data/model/Qwen2.5-32B-Instruct
--served-model-name Qwen2.5-32B
--max-model-len=20000 --gpu-memory-utilization 0.9
--tensor-parallel-size 1
--max-num-batched-tokens 4096
--load-format dummy
--port 8500
--enable-prefix-caching
--kv-transfer-config
'{"kv_connector": "LMCacheConnectorV1", "kv_role": "kv_both"}'

multiple rounds of session testing:
python3 LMCache/benchmarks/multi_round_qa/multi-round-qa.py
--num-users 120
--num-rounds 5
--qps 6
--shared-system-prompt 0
--user-history-prompt 1024
--answer-len 200
--model Qwen2.5-32B
--base-url http://localhost:8500/v1

markmc · 2026-04-08T10:01:52Z

The issue has been fixed on main since #37160 introduced this band-aid:

        self.local_cache_hit += max(
            0, (num_cached_tokens + recomputed - num_external_computed_tokens)

#37460 is the current candidate for a long-term fix

KingsleyZhang123 requested review from ApostaC, WoosukKwon, alexm-redhat, heheda12345, njhill, orozery, robertgshaw2-redhat and ywang96 as code owners March 17, 2026 23:10

mergify Bot added v1 bug Something isn't working labels Mar 17, 2026

gemini-code-assist Bot reviewed Mar 17, 2026

View reviewed changes

njhill approved these changes Mar 18, 2026

View reviewed changes

markmc requested changes Mar 18, 2026

View reviewed changes

markmc added this to Metrics & Tracing Mar 18, 2026

github-project-automation Bot moved this to Backlog in Metrics & Tracing Mar 18, 2026

chenminghua8 mentioned this pull request Apr 1, 2026

[Bugfix][Core] Fix negative prompt token counter increments with external KV cache accounting #38712

Open

5 tasks

markmc moved this from Backlog to In Review in Metrics & Tracing Apr 8, 2026

markmc closed this Apr 8, 2026

markmc moved this from In Review to Not planned in Metrics & Tracing Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] Ensure num_cached_tokens is non-negative for kv transfer failed requests#37354

[BugFix] Ensure num_cached_tokens is non-negative for kv transfer failed requests#37354
KingsleyZhang123 wants to merge 1 commit intovllm-project:mainfrom
KingsleyZhang123:Failed-kv-transfer-request-handling

KingsleyZhang123 commented Mar 17, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

markmc left a comment

Uh oh!

markmc commented Mar 18, 2026

Uh oh!

KingsleyZhang123 commented Mar 18, 2026

Uh oh!

chenminghua8 commented Apr 2, 2026

Uh oh!

markmc commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

KingsleyZhang123 commented Mar 17, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Before:

After:

No crash observed

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

markmc left a comment

Choose a reason for hiding this comment

Uh oh!

markmc commented Mar 18, 2026

Uh oh!

KingsleyZhang123 commented Mar 18, 2026

Uh oh!

chenminghua8 commented Apr 2, 2026

Uh oh!

markmc commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

KingsleyZhang123 commented Mar 17, 2026 •

edited by github-actions Bot

Loading